RANDOM PROJECTIONS Margin-constrained Random Projections And Very Sparse Random Projections

نویسندگان

  • Ping Li
  • Trevor J. Hastie
  • Kenneth W. Church
چکیده

Abstract We1 propose methods for improving both the accuracy and efficiency of random projections, the popular dimension reduction technique in machine learning and data mining, particularly useful for estimating pairwise distances. Let A ∈ Rn×D be our n points in D dimensions. This method multiplies A by a random matrix R ∈ RD×k, reducing the D dimensions down to just k . R typically consists of i.i.d. entries in N(0, 1). The cost of the projection mapping is O(nDk). This study proposes an improved estimator of pairwise distances with provably smaller variances (errors) by taking advantage of the marginal information. We also propose very sparse random projections by replacing the N(0, 1) entries in R with entries in {−1, 0, 1} with probabilities { 1 2 √ D , 1− 1 √ D , 1 2 √ D }, for achieving a significant √ D-fold speedup, with little loss in accuracy. Previously, Achlioptas proposed sparse random projections by using entries in {−1, 0, 1} with probabilities { 1 6 , 2 3 , 1 6 }, achieving a threefold speedup.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...

متن کامل

Random Projections for Anchor-based Topic Inference

Recent spectral topic discovery methods are extremely fast at processing large document corpora, but scale poorly with the size of the input vocabulary. Random projections are vital to ensure speed and limit memory usage. We empirically evaluate several methods for generating random projections and measure the effect of parameters such as sparsity and dimensionality. We find that methods with s...

متن کامل

Sparse signal recovery using sparse random projections

Sparse signal recovery using sparse random projections

متن کامل

Memory and Computation Efficient PCA via Very Sparse Random Projections

Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously effic...

متن کامل

Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data

Abstract We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006